Transcriptomic analysis of paired healthy human skeletal muscles to identify modulators of disease severity in DMD

Muscle damage and fibro-fatty replacement of skeletal muscles is a main pathologic feature of Duchenne muscular dystrophy (DMD) with more proximal muscles affected earlier and more distal affected later in the disease course, suggesting that different skeletal muscle groups possess distinctive characteristics that influence their susceptibility to disease. To explore transcriptomic factors driving differential gene expression and modulating DMD skeletal muscle severity, we characterized the transcriptome of vastus lateralis (VL), a more proximal and susceptible muscle, relative to tibialis anterior (TA), a more distal and protected muscle, in 15 healthy individuals using bulk RNA sequencing to identify gene expression differences that may mediate their relative susceptibility to damage with loss of dystrophin. Matching single nuclei RNA sequencing data was generated for 3 of the healthy individuals, to infer cell composition in the bulk RNA sequencing dataset and to improve mapping of differentially expressed genes to their cell source of expression. A total of 3,410 differentially expressed genes were identified and mapped to cell type using single nuclei RNA sequencing of muscle, including long non-coding RNAs and protein coding genes. There was an enrichment of genes involved in calcium release from the sarcoplasmic reticulum, particularly in the myofibers and these myofiber genes were higher in the VL. There was an enrichment of genes in “Collagen-Containing Extracellular Matrix” expressed by fibroblasts, endothelial, smooth muscle and pericytes, with most genes higher in the TA, as well as genes in “Regulation Of Apoptotic Process” expressed across all cell types. Previously reported genetic modifiers were also enriched within the differentially expressed genes. We also identify 6 genes with differential isoform usage between the VL and TA. Lastly, we integrate our findings with DMD RNA sequencing data from the TA, and identify “Collagen-Containing Extracellular Matrix” and “Negative Regulation Of Apoptotic Process” as differentially expressed between DMD compared to healthy. Collectively, these findings propose novel candidate mechanisms that may mediate differential muscle susceptibility in muscular dystrophies and provide new insight into potential therapeutic targets.


Introduction
Duchenne muscular dystrophy (DMD) is the most common progressive muscular dystrophy with childhood onset, and is caused by loss of function mutations in DMD (Hoffman et al., 1987), leading to profound weakness and premature death, mainly from cardiorespiratory failure. DMD encodes dystrophin, which plays a critical structural role in skeletal and cardiac muscle fibers by linking the intra-myofiber F-actin of the Z-disk to the extracellular matrix through binding components of the dystrophin-associated glycoprotein complex at the muscle membrane (Hoffman et al., 1987;Way et al., 1992). Absence of dystrophin in skeletal muscle leads to greater susceptibility to damage from contraction-induced injury (Petrof et al., 1993), resulting in leakage of calcium into the myofiber with a plethora of downstream consequences ultimately leading to myofiber death and replacement with fat and fibrosis. Fibroblasts, immune cells, and muscle stem cells are expanded, changing the extracellular matrix (Scripture-Adams et al., 2022). A large number of other muscular dystrophies have had their genetic basis decoded and many are components of the dystrophinglycoprotein complex (Cohen et al., 2021), including Limb-girdle muscular dystrophies (LGMDs) with similar patterns of muscle loss from proximal to more distal.
While DMD is always degenerative and leads to premature death, variation in disease progression between individuals in DMD has been used to identify genetic factors correlated with disease severity or progression. Disease severity is mitigated with residual dystrophin expression which usually results in slowing of disease progression (Fanin et al., 1995). However, even in cases of siblings with DMD who have the same DMD mutation, there can be discordance in the progression (Pettygrove et al., 2014), indicating that environmental or other genetic factors may modify disease severity. Various studies use variability in age at loss of ambulation (LOA) (Pegoraro et al., 2011;Flanigan et al., 2013;Bello et al., 2016;Weiss et al., 2018;Spitali et al., 2020) to identify variants associated with disease progression.
The overall pattern of sequentially affected muscles in DMD is highly similar across affected individuals and describes a distinctive pattern of progression with more proximal muscles affected earlier than more distal muscles (Rooney et al., 2020), suggesting that constitutive differences in the formation of those muscle groups encode factors that influence relative myofiber susceptibility to damage. Therefore, the study of healthy muscle may provide insights into susceptibility mechanisms in disease. An extreme example of protected striated muscles in DMD across multiple species are the extraocular muscles (EOM) (Karpati et al., 1988;Valentine et al., 1990;Kaminski et al., 1992). However, the functional requirements of EOM are substantially different from limb skeletal muscle. EOM have multiple innervated fibers, compartmentalization of layers with different fiber types, expression of EOM-specific myosin isoforms (encoded by MYH13, MYH15), and partial retention of embryonic and neonatal myosin expression in mature fibers (Porter, 2002). Differential gene expression studies in mouse (Porter et al., 2001) and rat (Fischer et al., 2002) highlighted calcium homeostasis, mitochondrial genes, lipid catabolism, immune processes, apoptosis, and extracellular matrix.
Here we study healthy muscle tissue from vastus lateralis (VL) and tibialis anterior (TA), to identify genes that alter myofiber susceptibility to fibrofatty replacement in DMD individuals, using paired samples from 15 donors to control for interindividual and age differences. While TA has a much more modest degree of protection from disease progression than EOM, TA is substantially and consistently protected from ongoing muscle damage in DMD relative to VL from longitudinal imaging and spectroscopy data of children with DMD (Rooney et al., 2020). We reasoned that the differential expression analysis of VL and TA in healthy individuals would provide insight into protective mechanisms relevant in the absence of dystrophin. The difference in progression is substantial. VL progresses faster than TA with an about 8.5year longer time for the TA to attain similar levels of damage as the VL (Rooney et al., 2020). In this transcriptomic study, we sampled VL and TA at a single timepoint from healthy young adult volunteers. We report differentially expressed genes and map differentially expressed genes to specific intra-muscular cell types using single nuclei analyses.

Muscle biopsies
Fifteen healthy individuals (age range 18-26 years) with no history of muscle or other chronic or acute disease were consented on UCLA protocol IRB#18-001366. Eight ambulatory DMD patients with a confirmed nonsense DMD mutation were consented on UCLA protocol IRB#11-001087 (age range 2-7 years). All biopsies were obtained using a Vacora (Bard) vacuum-assisted core needle from the VL and TA as previously described (Barthelemy et al., 2020). In brief, before the biopsies, the participant's leg was observed via ultrasound to ensure that the muscle showed no excess fat or blood vessels nearby. VL sample was obtained from about two-thirds of the muscle length, and the TA from about one-third of the muscle length. We chose muscle pieces that had similar muscle appearance without visible connective tissue to reduce sample variability. Each needle muscle biopsy core (about 125 mg) was dissected into about 25 mg pieces and flash frozen in liquid nitrogen within tissue cassettes within 5 min of excision and stored in liquid nitrogen until RNA extraction or sectioning for histological examination.

RNA sequencing
Frozen skeletal muscle (8-25 mg) was homogenized on ice in 500 µL of Trizol for RNA extraction using standard protocols (Lee et al., 2020). RNA quality was recorded by the RNA integrity number (RIN) using the Agilent RNA 6000 Nano chips. Healthy muscle RNA samples with RIN above 7 and DMD muscle RNA samples with RIN above 4 were used to prepare cDNA libraries with ribosomal RNA depletion using the KAPA RiboErase Kit (HMR) (Roche). About 50 million 150-151 bp paired-end RNA sequencing (RNAseq) reads were generated per RNA sample using Illumina Novaseq 6000 S4. Sequencing reads were aligned to GRCh38 (Ensembl 105, Gencode v39) using STAR 2.6.0c (Dobin et al., 2012;Lee et al., 2020). Data quality control included alignment metrics (ribosomal and globin RNA, aligned and unmapped reads, sequencing depth), hierarchical clustering, principal component analysis and Pearson correlation.

Single nuclei isolation and RNA sequencing
Single nuclei were isolated from a subset of 3 paired male healthy VL and TA frozen muscle and sequenced using the 10X Genomics platform as described previously (Scripture-Adams et al., 2022). Six to twelve 40 µM cross sections of frozen muscle biopsies were collected in a sterile tube to estimate a total of 3 mg of skeletal muscle, dounced with two cycles of strokes (one with a loose douncer followed by one with a tight douncer) in 1% bovine serum albumin (BSA) in phosphate-buffered saline (PBS) with 100 U/mL of type IV collagenase and 0.5 U/µL RNAse inhibitor, and stained with 10 μg/mL DAPI. The nuclei were sorted by fluorescence-activated cell sorting (FACS) to separate from debris and create a pure nuclear preparation prior to library preparation. 10X Chromium Single cell 3′ v3 libraries were prepared and sequenced on Illumina Novaseq 6000 S2 (2 × 50 bp) (10X Genomics). Single nuclei RNA sequencing (snRNAseq) reads were aligned to GRCh38 (Ensembl 105, Gencode v39) using Cell Ranger (10X Genomics). Data was aggregated for downstream processing and analysis. Initial cell clustering was performed using k-means within Cell Ranger (10X Genomics). Nuclear doublets were identified using DoubletFinder (version 2.0.3) (McGinnis et al., 2019) with a doublet rate of detection of 15%. Doublets as well as nuclei with 200 or fewer unique molecular identifiers (UMI), were excluded from downstream analysis. Re-clustering was performed after data filtering, and clustered nuclei populations were identified using known cell-type marker genes via Loupe Browser (version 6.0.0) (10X Genomics). Downstream analysis and statistical testing of differentially expressed genes across cell types was performed using the R package Seurat (version 4.0.2) (Hao et al., 2021) and the Wilcoxon statistical test. UMI-normalized average expression across cell types was obtained from Seurat's AverageExpression function, which returns the average number of transcripts per 10,000 transcripts (TP10K).

Cell deconvolution using single nuclei RNA sequencing
Raw bulk RNAseq read counts for were obtained from the STAR alignment (version 2.6.0c) (Dobin et al., 2012) and batch-corrected for the two sequencing runs using CombatSeq (sva version 3.38.0) (Zhang et al., 2020). Differential gene expression analysis across cell types in the snRNAseq dataset identified statistically significant (adjusted p-value <0.05) marker genes for each cell type. Highly specific markers for a specific cell type were defined as those expressed in less than 10% (for large cell populations) or 1% (for small cell populations) of the other cell types. A list of 69 cell-specific genes was obtained after further manual curation. Estimated cell proportions for each sample were obtained with CIBERSORTx (Newman et al., 2019) using the average expression of these 69 cell-specific genes. The parameters used for CIBERSORTx were: Job type = Impute Cell Fractions, Batch correction = disabled, Disable quantile normalization = true, Run mode = relative, Permutations = 100.

Differential gene expression analysis
The R package DESeq2 (version 1.30.1) (Love et al., 2014) was used to perform differential gene expression analysis using the raw read counts. The covariates included in the healthy VL versus TA analysis design were: participant study ID, RIN, and batch. The covariates included in the DMD versus Healthy analysis design were: batch, RIN, age, and sex. Multiple testing adjustment was done within DESeq2 using Benjamini-Hochberg for a false discovery rate (FDR) of less than 0.05.
Functional enrichment analysis of differentially expressed genes was performed for all differentially expressed genes (independent of their direction of highest expression) using EnrichR (https://maayanlab.cloud/Enrichr/, (Chen et al., 2013)), with all expressed genes included in the DESeq2 analysis as background. For the genes differentially expressed between VL and TA, we tested 4,701 terms from GO Biological Process 2023, and 408 terms from GO Cellular Component 2023. For genes differentially expressed between DMD and healthy, we tested 3,133 terms from GO Biological Process 2023, and 272 terms from GO Cellular Component 2023. Significant gene ontology (GO) terms (adjusted p-value <0.05) for the VL versus TA analysis were further summarized with ReviGO (http://revigo.irb.hr/, (Supek et al., 2011)) with the following parameters: dispensability threshold = 0.5, GO metric = adjusted p-value (lower value is better), remove obsolete GO terms = yes, species = Whole UniProt database, similarity measure = SimRel.
ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X enrichment category within EnrichR (Chen et al., 2013) was used to identify transcription factors (104 transcription factors tested) that putatively bind to the differentially expressed genes. Pathway enrichment of druggable genes higher in the VL was performed using EnrichR (Chen et al., 2013) KEGG 2021 Human enrichment category (250 terms tested).

Differential isoform usage analysis
The VL and TA raw data was aligned using the Kallisto app (Kallisto Quantification version 2.0.2, Kallisto 04.46.1) on the DNAnexus platform pipeline (Bray et al., 2016) to obtain counts and relative abundance (TPM) for each transcript (Gencode v39, Ensembl 105).
Differential isoform usage analysis between VL and TA was performed using the IsoformSwitchAnalyzeR (version 1.12.0) R package (Vitting-Seerup and Sandelin, 2017). The design matrix included: sex, sample RIN, and batch. Gencode v39 primary assembly annotation and transcripts were used to generate the switch list. Isoforms were prefiltered before testing for differential isoform usage using the following parameters: gene expression cutoff = 0.1 and isoform expression cutoff = 0, and genes with only one isoform were excluded. Isoform switch testing was done using DEXSeq (version 1.36.0) (Anders et al., 2012) within the IsoformSwitchAnalyzeR package, and correction for confounding factors indicated in the design matrix was performed simultaneously via the limma package (version 3.46.0) (Ritchie et al., 2015). The isoform switch analysis was limited to: switching genes (genes with at least one isoform significantly differentially used), genes with consequence potential (with isoforms differentially used in opposing directions, i.e., one with increased and one with decreased usage), and isoforms with at least two isoforms significantly differentially used (alpha<0.05), and a difference in isoform usage between muscles of at least 0.01 (1%).

Identification of transcriptional differences between VL and TA
Because of the substantial difference in the rate of progression/damage of VL and TA in DMD ( Figure 1A), we characterized the intrinsic transcriptomic profiles of paired healthy VL and TA using RNAseq and snRNAseq to reveal candidate mechanisms that may underlie this differential susceptibility to DMD ( Figure 1B). VL and TA biopsies were sampled from each of 15 healthy young adults during the same procedure. Extraction of RNA from frozen skeletal muscle was adequate with an average RIN of 8 across all samples (range 7.1-8.7), and an average of 54 million sequencing reads were obtained per sample (range 45-76 million reads). One sample that had lower sequencing depth, and two samples that were outliers by hierarchical clustering and had relatively lower correlation with the overall dataset were excluded. A total of 27 healthy muscle samples (26 paired VL-TA, 1 unpaired VL) were used for further analysis. Two-dimensional principal component analysis (PCA) on the expression of all 22,414 expressed genes among 27 samples demonstrates that RNAseq data cluster predominantly by muscle type and that muscles from the same individual do not cluster together ( Figure 1C). This indicates that there is more expression similarity between unrelated individuals in either the VL or TA than within an individual, or alternatively stated there are more intraindividual gene expression differences between VL and TA than interindividual differences from genetic variation.
Using DESeq2 (Love et al., 2014) with a paired analysis design, we identified a large set of 3,410 significantly differentially expressed genes (Supplementary Table S1), or 15.2% of all genes, demonstrating a substantial number of gene expression differences between the skeletal muscle groups. When we randomize participant IDs within muscle groups such that samples are no longer paired, and test for differential gene expression, we do not observe as many differentially expressed genes as we do with a paired analysis (empirical p-value = 0, n = 1,000 permutations). That is, our paired analysis of both muscles from the same individuals allowed us to identify a larger number of differentially expressed genes than we would have with an unpaired design. The most statistically significant differentially expressed gene was the transcription factor ZNF385A, and other top differentially expressed genes (by fold difference or p-value) included MYH1, COL22A1, the transcription factors BNC2, SIM2 and ZNF273, and the non-coding RNAs lnc-HLCS-1, lnc-CES1-7, lnc-APOB-2, and lnc-RORA-1 ( Figure 1D). Genes classified as protein coding by the Ensembl automatic annotation system were more likely to be differentially expressed, comprising 82% of all differentially expressed genes, but a substantial number of long noncoding RNAs (lncRNAs) are differentially expressed between the muscles (Supplementary Table S1).

Frontiers in Genetics frontiersin.org
Gene ontology analysis on all differentially expressed genes revealed an enrichment of 619 biological processes and 85 cellular component categories (Supplementary Table S2A). After summarization of redundant terms with ReviGO, the summarized GO term list is comprised of 75 biological processes and 28 cellular component GO terms (Supplementary Table S2B). We further focused on GO terms with over 10 genes, such that we could examine a larger number of genes contributing to the enrichment, resulting in a list of 58 biological processes and 21 cellular components. For each GO category, we ranked them by adjusted p-value, and then selected 6 relevant categories (Table 1) based on their involvement in muscle function and the dystrophic pathology.

Cell type differences in VL and TA
Fiber type composition varies between skeletal muscles in mice (Terry et al., 2018) and humans (Abbassi-Daloii et al., 2023). In humans, VL has a larger portion of fast myofibers than TA (Edgerton et al., 1975;Jakobsson et al., 1991), whereas in mouse, the TA is composed entirely of fast myofibers (Hämäläinen and Pette, 1993;Scripture-Adams et al., 2022). In DMD, the differential disease susceptibility between different skeletal muscles has been partly attributed to the higher proportion of fast fibers, which are more susceptible to damage in the disease course than slow fibers (Webster et al., 1988). These differences in cell composition may contribute to the differential disease susceptibility and be reflected in Healthy VL and TA transcriptomes are highly different. (A) Representation of the differential progression of vastus lateralis (VL) and tibialis anterior (TA) in DMD. The color scale indicates the progression from early stages of DMD where muscle fat fraction is minimal (illustrated in red), to late stages where muscle fibers are completely replaced by fat (illustrated in yellow). (B) Workflow for the identification of candidate mechanisms mediating differential muscle susceptibility to DMD. (C) The first two principal components (PC1 and PC2) are shown for batch-corrected normalized RNAseq data expression of all expressed genes (n = 22,414) across the 27 muscle samples. (D) Volcano plot for all 22,414 genes tested for differential expression. Dashed lines depict a fold change of 2 and a p-value of 0.05. For each muscle, the top 5 differentially expressed genes by fold difference and the top 5 genes by p-value are labeled.
Frontiers in Genetics frontiersin.org the observed transcriptomic differences. To assess the contribution of cell composition to gene expression, we performed snRNAseq of nuclei dissociated from a small subset of the healthy individuals. After excluding doublets and nuclei with less than 200 UMI, the VL and TA dataset consists of 14,887 single nuclei (5,151 VL and 9,736 TA) with a median of 386 genes and 568 UMI per nucleus, and with a total of 25,248 genes detected within all of the nuclei. Clustering analysis resulted in the identification of 8 known major cell types ( Figure 2A) with distinct transcriptomes (refer to Supplementary Table S2C for a list of positive marker genes). We compared the proportion of each major cell type within VL and TA. Consistent with previous reports, the three VL samples had a higher proportion of fast fibers compared to TA (paired t-test p = 4.25E-02, average fold difference 1.82) ( Figure 2C). The 3 TA samples had 1.29 times as many slow fibers, although this difference was not statistically significant in our snRNAseq dataset (paired t-test p = 2.52E-01) ( Figure 2C). Performing snRNAseq on a subset of samples allows us to infer cell composition in bulk RNAseq. By integrating bulk RNAseq and snRNAseq, we can explore the transcriptomic profile of our large dataset taking into consideration if the gene is specifically expressed in just 1 cell type, and thus map the differential expression of some genes to cell type. The snRNAseq dataset was also used to infer the percentage of all major cell types across our larger bulk RNAseq dataset. For this, we used CIBERSORTx (Newman et al., 2019) to deconvolute the bulk RNAseq data with 69 marker genes that we identified and define as being uniquely expressed within only one of the 8 major cell types ( Figure 2B). Overall, the percentage of cell types inferred by CIBERSORTx agreed with those observed by snRNAseq in the six samples with both data types (Pearson correlation = 0.81), and we can infer from bulk RNAseq that TA has a larger portion of slow fibers than VL (paired t-test p = 3.69E-05, average fold difference 1.54) and VL has a higher portion of fast fibers (paired t-test p = 1.93E-04, average fold difference 2.07) ( Figure 2C). We also observed a slight but significant increase in the percentage of endothelial (paired t-test p = 4.40E-03, average fold difference 1.16), pericytes (paired t-test p = 7.22E-03, average fold difference 1.14), and immune (paired t-test p = 3.63E-03, average fold difference 1.16) cells in TA compared to VL. The higher percentage of endothelial cells and pericytes in the TA is suggestive of a higher capillarity density compared to the VL. Similar differences in capillarity density across leg muscles have been reported previously, with a higher density in the lower leg gastrocnemius lateralis compared to the upper leg semitendinosus muscle (Abbassi-Daloii et al., 2023).

Mapping of differentially expressed genes to specific cell types within skeletal muscles
We next sought to identify the cellular source of differentially expressed genes identified by bulk RNAseq by interrogating their relative expression across the 8 major cell types identified in healthy human muscle. Out of 3,410 differentially expressed genes observed within the bulk RNAseq data, 3,221 (94.4%) were also observed in our snRNAseq dataset ( Figure 2D). Hierarchical clustering of their expression shows that the differentially expressed genes are typically not expressed in all cell types, but rather the vast majority are observed to have much higher expression in 1 cell type.
475 (14.75%) and 327 (10.15%) of the differentially expressed genes have the highest average expression in the fast and slow myofibers, respectively. Considering that the VL has a higher proportion of fast myofibers than TA, we expected to observe many genes that are higher in VL from the bulk RNA analysis to Frontiers in Genetics frontiersin.org be higher because they are expressed in fast fibers, and those higher in TA to be highest expressed in slow fibers. In line with this, we observed that differentially expressed genes that are higher in VL are more often expressed highest in fast fibers (445 genes, or 80.0% of the 556 genes higher in VL with highest expression in the myofibers) and conversely, differentially expressed genes that are higher in TA are most often restricted in their expression in slow fibers (216 or 87.8% of the 246 genes higher in TA with highest expression in the myofibers). However, there are exceptions to this expected pattern of expression based on the higher proportion of fast fibers in VL compared to TA (Supplementary Figure S1), and these may indicate shifts in metabolic phenotype within each muscle type. For instance, 111 of 1,559 genes higher in VL are most highly expressed in slow fibers (Supplementary Figure S1A) and 30 of 1,851 genes higher in TA have highest expression in fast fibers (Supplementary Figure  S1B). Interesting exceptions include CAMK2A (encoding CaMKIIα) which is among the genes higher in the VL with higher expression in slow fibers than fast fibers. Conversely, IRX3, encoding iroquois homeobox 3, which has been linked to body weight (Gholamalizadeh et al., 2019), has ten-fold higher expression in fast fibers compared to slow fibers, but is higher in the TA muscles.
Remarkably, the remaining 75.1% of the differentially expressed genes have highest expression in other muscle resident cell types that are not the myofibers. Despite satellite cells, endothelial, smooth muscle and fibroblasts accounting for 6.60%, 7.01%, 0.69% and 3.17% of total cell population in both VL and TA, the percentage of Frontiers in Genetics frontiersin.org differentially expressed genes with highest expression in these cell types were 15.71%, 14.96%, 10.87% and 8.20%, respectively, demonstrating differences in virtually all cells between skeletal muscle groups. Pericytes accounted for 17.57% and immune cells for 15.79% of all cells but had fewer genes that were detected as differentially expressed, 9.87% and 15.49%, respectively.
To determine which cell types have the highest expression of the differentially expressed genes within functional categories, we mapped cell expression using the single nuclei data (Supplementary Table S1). Eight of 13 (61.54%) genes in "Regulation Of Release Of Sequestered Calcium Ion Into Cytosol By Sarcoplasmic Reticulum" ( Figure 3A) were higher in the VL, and these have highest expression predominantly in fast fibers (5 genes, 38.46%). These include CALM1 and CASQ1, encoding calmodulin 1 and calsequestrin 1, respectively. Only PDE4D has highest expression in slow fibers, which we infer is differentially expressed independent of the differences in fiber type composition.
Genes in "Regulation Of Apoptotic Process" are more broadly expressed across all cell types ( Figure 3C), suggesting that differential regulation of cell death is a characteristic of all cells in the VL and TA due to muscle origin. 22 genes have highest expression in the myofibers, including the heat shock protein CRYAB higher in the TA, mapping to the slow fibers and also annotated in the biological process category "Negative Regulation of Apoptotic Process", which is also enriched among the differentially expressed genes (Supplementary  Table S2A). Among genes with higher expression in other muscle resident cells, the widely studied anti-apoptotic BCL-xL/BCL2L1, higher in the TA, was highest expressed in endothelial cells. Despite not being annotated in "Regulation Of Apoptotic Process", Hzf (encoded by ZNF385A) has been linked to negative regulation of apoptosis. In conditions of DNA-damaging stress, Hzf induction and binding to p53 modulates p53-mediated transcription such that the expression of pro-arrest p53 target genes is preferentially activated over pro-apoptotic p53 target genes (Das et al., 2007). ZNF385A is the most statistically significant gene with a 4.2X higher expression in TA (p = 1.76E-102) ( Figure 1D), and has the highest expression in pericytes (average expression 0.31 TP10K) and similar levels of expression in fast and slow fibers (average expression 0.03 TP10K in both fiber types). These data suggest that the VL and TA have differential regulation of apoptotic signaling, with a potentially superior negative regulation in the TA that may be protective in DMD.

Search for transcription factors that may underlie the differential gene expression
Using the ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X enrichment category within EnrichR (Chen et al., 2013), we identified 45 transcription factor genes that are reported to bind to multiple differentially expressed genes between VL and TA, and these may thus regulate the differentially expressed genes (Supplementary Table S2D). Because some transcription factors can act as both positive and negative regulators of expression, we searched for transcription factors that bind upstream of all differentially expressed genes, independently of the muscle in which they are highest expressed. Among these 45 transcription factors, 38 were expressed in the VL/TA bulk RNAseq dataset, and 11 were differentially expressed. The top 6 by p-value are: TP63, AR, GATA2, KLF4, SMC3, and SMAD4 (Supplementary Table S2D). For each transcription factor, the putative target genes are listed in Supplementary Table S2D. These genes are further categorized in "Regulation Of Release Of Sequestered Calcium Ion Into Cytosol By Sarcoplasmic Reticulum", "Collagen-Containing Extracellular Matrix", and "Regulation Of Apoptotic Process" by the muscle in which they are highest expressed and listed in descending fold difference between the muscles (Supplementary Table S2D).
The 11 differentially expressed transcription factors were detected by snRNAseq (Supplementary Figure S2). The transcription factors higher in TA (ZMIZ1, FOSL2, GATA2, KLF4 and EGR1) are mainly expressed in non-myolineage cell types, and mainly in fibroblasts and endothelial cells, consistent with a potential role regulating the extracellular matrix gene expression in non-myolineage cells. Among the differentially expressed genes in "Collagen-Containing Extracellular Matrix" and higher in TA, the metalloprotease MMP2 is a target of GATA2 (Supplementary Table S2D). The transcription factors higher in VL are expressed in myolineage and non-myolineage cell types. TP63 is restricted to the myofibers, and highest in fast fibers, whereas AR is highest expressed from satellite cells. Among the differentially expressed genes in "Regulation Of Release Of Sequestered Calcium Ion Into Cytosol By Sarcoplasmic Reticulum" is the TP63 target ATP1A2 (Supplementary Table  S2D), highest expressed in fast fibers (Supplementary Table S1), and the AR target CALM1 (Supplementary Table S2D) with highest expression in endothelial and fast fibers (Supplementary Table S1).
3.5 Genes that are previously reported as DMD biomarkers and genetic modifiers are enriched among genes differentially expressed between healthy VL and TA To explore potential relationships between our differential gene expression and reported mechanisms of DMD pathology that can be mapped to individual genes, we analyzed the differential expression of previously reported human serum/blood DMD biomarkers (Hathout et al., 2014;Parolo et al., 2018;Spitali et al., 2018;Al-Khalili Szigyarto, 2020;Grounds et al., 2020;Alonso-Jiménez et al., 2021;Wagner et al., 2021;Lee-Gannon et al., 2022;Wu et al., 2022), and of genes modifying the phenotype of DMD in humans (Pegoraro et al., 2011;Flanigan et al., 2013;Bello et al., 2016; Frontiers in Genetics frontiersin.org Hogarth et al., 2017;Li et al., 2018;Weiss et al., 2018;Spitali et al., 2020;Flanigan et al., 2023) or the mdx mouse (Deconinck et al., 1997;Wagner et al., 2002;Han et al., 2011;Morales et al., 2013;de Zélicourt et al., 2022), also known as genetic modifiers, across the VL and TA and identified the predominant cell type in which they were expressed. Among 88 DMD biomarkers expressed in healthy muscle, 41 (46.59%) were differentially expressed between VL and TA (Supplementary Figure S3A), a significant enrichment of this set of genes among differentially expressed genes (only 13 expected, p-value < 1E0-5, χ 2 test). Of this set of differentially expressed biomarkers, 20 of 41 (48.78%) have highest expression in the myofibers, with 10 of these being more expressed in the slow and 10 in the fast fibers. Among these myofiber-derived biomarkers, 13 have the highest expression in VL, and 9 of these (69.2%) have highest expression in the fast fibers (ALDOA, CAMK2B, ENO3, GAPDH, LDHA, MSTN, MYL1, PYGM, TNNI2). The remaining 4 biomarkers higher in VL have either highest expression in the slow fibers (CAMK2A, ACTA1) or are similarly expressed in fast and slow fibers (CKM, TTN).
We also culled from the literature 18 genes described as genetic modifiers based on either a human genetic variant association with a DMD phenotype (SPP1, ACTN3, THBS1, LTBP4, HLA-A, DYNLT5, CD40, NCALD, ETAA1, ADAMTS19, MAN1A1, GALNTL6, PARD6G) or an mdx phenotype modified by concomitant deletion of another gene (MSTN, DYSF, CCN2, UTRN, CD38) (Deconinck et al., 1997;Wagner et al., 2002;Han et al., 2011;Pegoraro et al., 2011;Flanigan et al., 2013;Morales et al., 2013;Bello et al., 2016;Hogarth et al., 2017;Li et al., 2018;Weiss et al., 2018;Spitali et al., 2020;de Zélicourt et al., 2022;Flanigan et al., 2023). All were observable within the RNAseq and snRNAseq datasets, except SPP1 (encoding osteopontin), which has a median TPM of 0.01 across VL and TA RNAseq and was not detected in snRNAseq of healthy VL and TA in any cell type. Thus, SPP1 was below the limits of detection, and it was excluded from the differential gene expression analysis. 11 of 17 (64.7%) remaining DMD genetic modifiers were differentially expressed between VL and TA (Supplementary Figure S3B). This is a larger number than expected from a random sampling of all genes (only 3 expected, p-value < 1E0-5, χ 2 test). Of these 11 genetic modifiers with differential expression detected, DYSF, MSTN, ACTN3, NCALD, ADAMTS19 and CD38 were higher in the VL, and HLA-A, UTRN, LTBP4, CCN2/CTGF, and THBS1 were higher in the TA (Supplementary Figure S3B). This indirectly supports the relevance of genetic variants indeed contributing to differential disease progression across individuals. Most of the genetic modifiers (10 of 17) had expression mainly within a nonmyofiber cell type, consistent with the known role of nonmyofiber lineage cells in orchestrating muscle remodeling during regeneration and fibrosis (Mann et al., 2011).
LTBP4 is most expressed in fibroblasts (Supplementary Figure  S3B), consistent with prior reports on its ameliorative effect through a reduction in TGF-β signaling in fibroblasts with the IAAM haplotype (Flanigan et al., 2013). In addition to fibroblasts, LTBP4 also shows high expression in satellite cells, suggesting the potential of a modifying effect acting upon muscle stem cells that has not been studied previously. DYNLT5 (also known as TCTEX1D1) has a median TPM of 0.6 in the bulk RNAseq dataset, and it was not detected in fast or slow fibers by snRNAseq but was rather expressed in endothelial cells. This suggests its modifying mechanism is not due to direct expression within myofibers, or that it could be upregulated in a cell type other than endothelial cells in DMD to exert its modifying mechanism. NCALD, encoding the calciumsensing neurocalcin delta, is most expressed in smooth muscle (7.11 TP10K), and has lower expression in fast (1.90 TP10K) and slow (0.42 TP10K) fibers. Its proposed mechanism is via regulation of a surrogate cGMP pathway that compensates for the defective nitric oxide-induced cGMP production in DMD, with lower expression of NCALD being protective (Flanigan et al., 2023). Consistent with this proposed mechanism, NCALD is not only higher in the VL in bulk RNAseq, but also in the VL fast (2.51X, p = 2.11E-76) and VL slow (1.11X, p = 3.01E-02) fibers compared to the TA fast and slow fibers, respectively. HLA-A is expressed higher in the VL, and class I MHC expression on myofibers may influence immune mediated mechanisms of myofiber damage in dystrophic muscle.
Only four reported genetic modifiers, DYSF, ADAMTS19, MSTN, and ACTN3 are observed to have the highest expression in myofibers (Supplementary Figure S3C). The expression pattern of DYSF, MSTN and ACTN3 is consistent with their described modifying mechanisms (Wagner et al., 2002;Vincent et al., 2007;Han et al., 2011). DYSF has similar expression in fast and slow fibers, with a slight 1.2X higher expression in fast fibers. Although the proposed modifying mechanism of ADAMTS19 is through extracellular matrix (ECM) remodeling and TGF-β signaling (Flanigan et al., 2023), its highest expression in healthy muscle was not in fibroblasts (0.12 TP10K) or vasculature cells that typically produce ECM, but rather in the fast (1.58 TP10K) and slow (1.07 TP10K) fibers. A 13.2X higher expression of ADAMTS19 in the fast fibers compared to fibroblasts suggests a modifying role in the myofibers that needs further exploration. At the single cell level, ADAMTS19 is 1.55X higher in the VL slow fibers compared to the TA slow fibers (p = 1.18E-14), which further contributes to its higher expression in the VL. MSTN is expressed in both fast and slow fibers, with highest expression in fast fibers (4.4X compared to slow fibers), a pattern of expression that contributes to it being higher in VL by bulk RNAseq, as VL has a higher proportion of fast fibers. ACTN3 is highly specific to fast fibers (13.3X higher in fast fibers), although not absent in slow fibers. In addition, at the single cell level, ACTN3 expression is 1.43X higher in the VL fast fibers compared to the TA fast fibers (p = 2.17E-11), indicating that the higher expression of ACTN3 in VL is influenced by both a higher proportion of fast fibers and by a VL-specific upregulation within the fast fibers.
The well-studied null allele of ACTN3 (rs1815739, NM_ 001104.4:c.1729C>T, NP_001095.2:p.Arg577Ter/p.R577X) is a common allele found in the population with a frequency of the X allele of 0.36 (dbSNP). Actinin-3 loss was associated with a reduced DMD severity as measured by a longer 10-min walk test (Hogarth et al., 2017), and this was attributed to a switch to a more protective oxidative metabolism without a shift in fiber type distribution (MacArthur et al., 2008). To further investigate the effects of ACTN3 expression across muscles, we genotyped rs1815739 in the 15 individuals. We identified 3 null homozygotes (XX), and 3 reference homozygotes (RR), and 9 heterozygotes (RX) among the 15 individuals. The expression of ACTN3 was significantly differentially expressed dependent on Frontiers in Genetics frontiersin.org genotype (Kruskal-Wallis p = 7.73E-04), with the RR group showing highest expression, indicating nonsense-mediated decay of the X allele ( Figure 4A). XX homozygotes have no ACTN3 mRNA expression for both VL and TA muscles ( Figure 4A). The expression of ACTN3 RR and RX mRNA was consistently higher in the VL compared to TA, although only statistically significant in the RX genotype (Wilcoxon p = 9.9E-04). For both the VL and TA, the mean level of ACTN3 mRNA in RX heterozygotes was The bars in the box plots indicate 1.5* IQR, which is the interquartile range, or the distance between the first and third quartiles.
Frontiers in Genetics frontiersin.org substantially lower than the expected 50% of the RR genotype mRNA level, suggesting that the X allele reduces the expression of the R allele through unknown mechanisms.
To validate the RNA findings and assess whether the ACTN3 genotype groups have differences in fiber type composition, we performed immunofluorescent staining for 3 RR, 2 RX and 2 XX individuals' VL and TA. For each sample and muscle, 3 10 µM tissue sections were stained (total of 42 sections), and fibers were counted across the entire sections. An average of 204 fibers were counted per sample (range 25-758, total 8,577) (Supplementary Figure S4). Observed differences in mRNA expression were consistent with antibody staining for actinin-3. As expected, ACTN3 XX homozygotes showed no detectable protein ( Figures 4B,E). The percentage of actinin-3 positive fibers was consistently higher in the VL for both RR and RX genotypes ( Figure 4B). The percentage of type I slow fibers (positive for myosin-7) was consistently higher in the TA across genotype groups ( Figure 4C), as expected (Edgerton et al., 1975;Jakobsson et al., 1991). There was a higher percentage of type I slow fibers in the RX and XX groups compared to the RR group across both muscles, although only the XX group reached statistical significance ( Figure 4C). There was also a higher percentage of type I fibers in the XX compared to RX group. These findings are supported by observed similar relative expression of the myosin heavy chain genes at the RNA level (Supplementary Figure S5). Interestingly, we also identified slow fibers with low expression of actinin-3 ( Figure 4D) in the RR and RX genotypes but not in the XX genotypes, reflective of a low level of expression of actinin-3 in some slow myofibers. This low level of expression is only apparent because of the true null staining revealed in the XX genotype individuals.

Identification of druggable targets within differentially expressed genes
We place the list of differentially expressed genes into context as potential for disease modification because their RNA or protein products are targeted or 'druggable' with existing drugs documented in the DrugBank database (Wishart et al., 2018). 535 of 3,410 (15.7%) differentially expressed genes are reported targets of 1,812 known drugs (Supplementary Table S1). The protein product of 197 genes higher in the VL are targeted by 984 drugs, and thus may constitute a set of known drugs that may be explored to induce a shift of a VL-like susceptible state towards a TA-like protected state in DMD. Druggable genes expressed higher in VL are enriched in 158 pathways (Supplementary Table S2E). Among the top 5 most significant pathways is calcium signaling pathway, with 22 genes higher in VL that include calmodulin (CALM1, CALM2), calmodulin-dependent kinases (CAMK2A, CAMK2B, CAMK2G), calsequestrin (CASQ1), ryanodine receptor (RYR1), and the dihydropyridine receptor alpha 1S subunit (CACNA1S). These 8 genes alone are reported to be targeted by 71 drugs and may suggest ways to therapeutically regulate intracellular and sarcoplasmic reticulum (SR) calcium concentration in myofibers. CD38 (1.7X higher in VL) is highest expressed in the pericytes (1.83 TP10K), but also is expressed in fast (1.28 TP10K) and slow (0.48 TP10K) fibers. Its higher expression in VL is also observed at the single cell level, with 1.24X higher expression in VL fast fibers compared to TA fast fibers (p = 4.71E-04). CD38 encodes a NAD+ glycohydrolase that produces regulators of Ca2+ signaling, and deletion of CD38 or treatment with CD38 inhibitors restored the mdx heart, diaphragm and limb function, reduced fibrosis and inflammation, and reduced the cycles of degeneration and regeneration (de Zélicourt et al., 2022). DMD myotubes treated with a monoclonal antibody against CD38 (Isatuximab) reduced the frequency of spontaneous Ca2+ waves (de Zélicourt et al., 2022).

Identification of isoforms with differential abundance between VL and TA
Because extensive alternative splicing is observed in developing and mature muscle (Brinegar et al., 2017;Nakka et al., 2018), including in genes encoding sarcomere structural (Donner et al., 2004;Bowman et al., 2007;Lam et al., 2018;Savarese et al., 2018), and excitation-contraction coupling proteins (Nakka et al., 2018), we hypothesized that the VL and TA transcriptomes are also differentially influenced by alternative splicing leading to significant shifts in the usage of specific isoforms (isoform switch). To identify isoform switching events between VL and TA, we utilized IsoformSwitchAnalyzeR (Vitting-Seerup and Sandelin, 2017), which uses the abundance (TPM) and count data obtained from Kallisto transcript alignment. Prefiltering of the annotated transcripts resulted in 130,664 transcripts to be considered for isoform switch analysis. Further filtering of transcripts for switching genes with at least two significantly switching isoforms and with at least two isoforms preferentially used in opposed directions (higher in one muscle, lower in the other) resulted in 47 transcripts. Among these, 12 transcripts have a significant isoform switch (isoform switch q-value <0.05) between VL and TA, and these are located within 6 genes (gene switch q-value <0.05) (Supplementary Table S2F). Two of the 6 genes with isoform switching, NPR3 and TNNT1, were differentially expressed between VL and TA, whereas the remaining 4 (NEB, ABCC6P2, ENSG00000288071 and MAD1L1) did not have differential expression at the gene expression level ( Figure 5A; Supplementary Table S1). NEB is highly and similarly expressed in slow (217.6 TP10K) and fast (185.4 TP10K) myofibers (Supplementary Table S1). ENSG00000288071 and TNNT1 are more highly expressed in the slow fibers (6.8X and 16.4X higher expression in the slow compared to fast fibers), and ABCC6P2 in the fast fibers (0.009 TP10K in fast and not detected in slow fibers) (Supplementary Table S1). NPR3 and MAD1L1 are expressed highest in the smooth muscle cells (Supplementary Table S1). Four of 6 genes with isoform switch are protein coding (NEB, NPR3, TNNT1 and MAD1L1), whereas ENSG00000288071 is a long non-coding RNA, and ABCC6P2 is a transcribed unprocessed pseudogene (Supplementary Table S1).
To assess potential functional consequences of the isoform switches, we examined the biotype of each isoform in the switching ( Figure 5B). The NEB, TNNT1 and NPR3 isoform switches are among protein coding transcripts, and that of ENSG00000288071 is among long noncoding RNAs (lncRNA). The remaining 2 genes (ABCC6P2 and MAD1L1) are switching between isoforms of different biotypes ( Figure 5B). ABCC6P2-202, Frontiers in Genetics frontiersin.org Frontiers in Genetics frontiersin.org 13 preferentially used in TA, is a processed transcript, whereas ABCC6P2-201, preferentially used in VL, is a transcribed unprocessed pseudogene. MAD1L1-213, with preferential usage in TA, is protein coding, whereas MAD1L1-216, preferentially used in VL, is a processed transcript, which means that it does not have an open reading frame (a start codon followed by an inframe stop codon (Kute et al., 2022)).
The MAD1L1 isoform switch comprises MAD1L1-213 and MAD1L1-216 with the former used more in TA (absolute difference isoform fraction = 0.073) and the latter in VL (absolute difference isoform fraction = 0.043) (Supplementary Table S2F). MAD1L1 has a relatively low expression in muscle. The top 3 isoforms expressed in both VL and TA have an average TPM ranging from 0.80 to 1.65. MAD1L1 encodes for the mitotic arrest deficient-like protein 1 (also known as MAD1). In mdx, Mad1l1 is most expressed in late activated satellite cells, myoblasts and myocytes (Scripture-Adams et al., 2022) (data not shown), suggesting a potential role in muscle regeneration in wild-type muscle and in DMD, although which isoform is most important is unknown. The MAD1L1 isoform switch involves a switch from a protein coding isoform in TA to a processed transcript that has no open reading frame in VL, suggesting a potential mechanism of reducing its protein expression in the VL via alternative splicing, an event that cannot be detected by gene expression analysis.
The gene with the most striking isoform switch is NEB. This switch comprises the mutually exclusive exon splicing event that occurs between exons 143 and 144 of NEB, which has been previously described (Donner et al., 2004;Lam et al., 2018). Exons 143 (E143, included in NEB-207) and 144 (E144, included in NEB-202) are mutually exclusive exons ( Figure 5C), such that in the same transcript, only one of either exon is included. NEB-207 has a higher usage in TA, a 0.505 isoform fraction difference compared to VL (Supplementary Table S2F; Figure 5D). NEB-202 has a higher usage in VL, with a 0.443 isoform fraction difference compared to TA (Supplementary Table S2; Figure 5D). This difference in isoform usage is readily observed at the isoform expression level ( Figure 5E). NEB-207 has higher expression in TA, with an average TPM of 850, compared to an average TPM of 66 in VL. NEB-202 is more broadly expressed across both muscles but has a preferential expression in the VL with an average TPM of 1,357, compared to an average TPM of 641 in TA. We confirmed this differential alternative splicing event in the RNA by semiquantitative reverse transcription polymerase chain reaction (RT-PCR) using exon junction-specific primers (data not shown).
Previous reports sought to determine whether the expression of nebulin exon 143 presents a fiber type-specific pattern in adult human quadriceps (Lam et al., 2018). In this previous study, E143 was found expressed more often in fast fibers compared to slow fibers (Lam et al., 2018), although a distinction between type IIa and type IIx fast fibers was not explored. Consequently, it was concluded that fast fibers usually express E143, and that slow fibers may express either E143 or E144. However, because we observe that the VL mainly includes E144 and not E143, and because VL has in average 1.82 more fast fibers than TA ( Figure 2C), we reasoned that the differential pattern of expression of E143 between VL and TA is not solely dependent on fast fiber type. Thus, we assessed the protein expression of E143 across VL and TA in relation to type IIa and IIx fast myosin. We examined overall E143 protein intensity among VL and TA in 3 healthy individuals. We found low to no E143 myofiber intracellular protein expression in the VL among either fiber type ( Figure 5G). Consistent with this observation, overall E143 protein intensity was statistically higher in the TA compared to the VL (Wilcoxon test p = 1.29E-08, 95% CI = 2.70-7.04, average fold difference = 5.77) ( Figure 5F). These findings suggest that although nebulin including exon 143 is more often expressed in the fast fibers (Lam et al., 2018), and E143 is more consistently observed in the fast type IIx than in the slow type I (data not shown), a fast fiber type is not the sole determinant of its expression in human skeletal muscle. That is, that the association between fast myosin and exon 143 of nebulin, as described previously (Lam et al., 2018) is also muscle-type specific, and might be regulated by specific differentially expressed splicing factors, or their combinations.
To identify potential splicing factors underlying the alternative splicing observed between VL and TA, we looked for splicing factors that are differentially expressed between the muscles. Out of 66 expressed splicing factors obtained from the SpliceAid-F database (Giulietti et al., 2013), 18 (27.3%) were differentially expressed between VL and TA (Supplementary Table S1). Only one splicing factor, NOVA2, is higher in TA, and the remaining 17 are higher in VL, with ESRP2 and CELF2 being the most differentially expressed.

Comparison of VL versus TA differentially expressed genes with DMD versus healthy muscle differentially expressed genes
To assess whether the differentially expressed genes between healthy muscles differentially susceptible to lack of dystrophin are also changed in expression in the context of DMD, we generated bulk RNAseq data of the TA from 8 young ambulatory DMD patients (mean age 4.5 years). An average of 60 million paired end sequencing reads were generated per sample (range 43-72 million reads). To our knowledge, this is the second and largest reported bulk RNAseq dataset of DMD muscle (an existing dataset can be found in SRA ID PRJNA734152), and the first of the TA muscle.
Using DESeq2 (Love et al., 2014), we performed differential gene expression analysis between DMD (n = 8) and healthy TA (n = 13). 868 of 17,183 analyzed genes were differentially expressed between DMD and healthy (Supplementary Table  S3; Figure 6A). 272 of these genes were also differentially expressed between VL and TA ( Figure 6B). Among these overlapping genes, 67 were downregulated in DMD and higher in the less susceptible TA or upregulated in DMD and higher in the more affected VL ( Figure 6B). Next, we assessed functional enrichment in the genes dysregulated in DMD using EnrichR. 49 biological processes and 11 cellular component categories were enriched among genes dysregulated in DMD (adjusted p-value <0.05) (Supplementary Table S2G). Among these, 31 categories were also enriched among genes differentially expressed between VL and TA. "Collagen-Containing Extracellular Matrix" was the most significant shared term and also the most significant term among all enriched in DMD versus Frontiers in Genetics frontiersin.org Table S2G). The categories with over 30 gene members include "Collagen-Containing Extracellular Matrix" and "Negative Regulation of Apoptotic Process", supporting the involvement of these gene sets in both the differential susceptibility between VL and TA, and the dystrophic pathology ( Figure 6C).

FIGURE 6
Extracellular matrix and regulation of apoptosis are also dysregulated in DMD. (A) Volcano plot for all 17,183 genes tested for differential expression between DMD and healthy TA. Dashed lines depict a fold change of 2 and a p-value of 0.05. For each up and downregulated genes, the top 5 differentially expressed genes by fold change and the top 5 genes by p-value are labeled. (B) Overlap of the differentially expressed genes between VL and TA and between DMD and Healthy. The 67 genes are those downregulated in DMD and higher in TA or upregulated in DMD and higher in VL. (C) 7 of 9 functional categories with over 30 genes that are shared between the two analyses (sorted by ascending p-value). "Nervous System Development" (which had only VEGFA in the 67-gene list, and "Cell-Substrate Junction" (which has the same 37 genes as "Focal Adhesion", and a larger p-value) were excluded. (D) Single cell expression of the 67 overlapping genes. The overlapping GO terms from (C) in which each gene member is categorized among these 7 categories are indicated. "Druggable" indicates which genes have existing drugs documented on DrugBank. "Direction" indicates the direction of expression in both the VL versus TA and DMD versus Healthy differential gene expression analyses.
Frontiers in Genetics frontiersin.org To identify candidate susceptibility factors that are also dysregulated in DMD, we further explored the 67 overlapping genes. All except one gene (SBK3) were detected in the healthy muscle snRNAseq ( Figure 6D). 13 of these 67 genes are druggable ( Figure 6D). Among the overlapping protein coding genes, the 3 most dysregulated genes in DMD (by fold change) are NR4A3, APOB and MYOC, and the 3 most differentially expressed in VL compared to TA are APOB, SBK3 and NR4A3, highlighting the relevance of these genes in DMD and consequently, the differential susceptibility of VL and TA (Supplementary Table S3).
Among genes in "Collagen-Containing Extracellular Matrix" are MYOC and CILP ( Figure 6D). MYOC, encoding myocilin, is most expressed in fibroblasts ( Figure 6D; Supplementary Table S1) and is downregulated 47.5X in DMD (Supplementary Table  S3), despite the expansion of fibroblasts within dystrophic muscle (Scripture-Adams et al., 2022), suggesting a downregulation in dystrophic fibroblasts. Myocilin has been widely studied in glaucoma as a secreted protein in the trabecular meshwork (Resch and Fautsch, 2009). Myocilin was also found to be induced during C2C12 myoblast differentiation via regulation of the TGF-β pathway (Zhang et al., 2021), and to interact with the dystrophin-glycoprotein complex via syntrophin (Joe et al., 2012). In the Human Protein Atlas (Uhlén et al., 2015), MYOC is highest expressed in fibroblasts in skeletal muscle, and not in myocytes, consistent with our observations. The overexpression of MYOC increases muscle mass (Joe et al., 2012), and downregulation of myocilin is observed in cancer cachexia, with its loss inducing muscle fiber atrophy and an increase in fibrotic and fatty tissue (Judge et al., 2020). Because its expression is highest in fibroblasts, and it is found within "Collagen-Containing Extracellular Matrix", we hypothesize that its main role in human skeletal muscle is in the fibroblasts, and not the myolineage, and that its downregulation promotes fibrosis. These data, along with the 1.77X higher expression of MYOC in the TA (Supplementary Table S1), support myocilin as a protective factor for the TA.
Conversely, CILP, encoding the cartilage intermediate layer protein 1 (CILP-1), is upregulated 4.3X in DMD (Supplementary  Table S3), and is 1.27X higher in the more susceptible VL. CILP also has highest expression in the fibroblasts in our dataset ( Figure 6D; Supplementary Table S1) and in the Human Protein Atlas (Uhlén et al., 2015), but its role is not well understood. Upregulation of CILP-1 occurs upon cardiac injury in fibrotic regions, and there is a decrease in serum of patients with heart failure (Park et al., 2020). Its anti-fibrotic effect in pressure-overload cardiac remodeling  suggests that CILP-1 is regulated in relation to processes that involve cardiac fibrosis. Because of its upregulation in DMD and higher expression in VL, and its restricted expression in the fibroblasts, we hypothesize that CILP-1 is pro-fibrotic in skeletal muscle, and a susceptibility factor for the VL.
Among other overlapping genes is SLC19A2, which is the most statistically significantly dysregulated gene in DMD compared to healthy TA (p = 1.59E-10), and which is downregulated 41.77X in DMD (Supplementary Table S3). SLC19A2 encodes the thiamine (vitamin B1) transporter 1 (THT1), which has highest expression in skeletal muscle (GTEx), specifically in slow fibers ( Figure 6A; Supplementary  Table S1). Thiamine supplementation has been shown to improve muscle strength in myotonic dystrophy type 1 (Costantini et al., 2016). In mdx, supplementation with the thiamine precursor benfotiamine ameliorated the dystrophic pathology and increased grip strength (Woodman et al., 2018), supporting a protective role for the TA compared to VL in DMD, and potentially also in the slow fibers compared to fast fibers. Lastly, KIF21A, highest expressed in slow fibers, is downregulated 2.91X in DMD (Supplementary Table S3), and is 1.20X higher in the TA (Supplementary Table S1). Heterozygous mutations in KIF21A cause autosomal dominant congenital fibrosis of extraocular muscles (EOM) (Yamada et al., 2003). The downregulation of KIF21A in DMD skeletal muscle, reduced function leading to pathologic fibrosis in EOM, and its higher expression in the TA suggest a protective role of higher KIF21A expression within myofibers that leads to some protection from damage in DMD. KIF21A encodes a kinesin, involved in cargo transport between the Golgi apparatus and the endoplasmic reticulum (Hirokawa and Noda, 2008), but its role in skeletal myofibers is not established.
Among the 67 overlapping genes, APOE is found in "Negative Regulation of Apoptotic Process" and is upregulated in DMD and higher in the VL. APOE is a highly specific satellite cell marker in healthy muscle ( Figure 2B), suggesting that a potential differential regulation of apoptotic signaling ( Figure 3C) may alter the regenerative capabilities of VL and TA.
Cell type specificity of the 67 genes differentially expressed between VL and TA and also dysregulated in DMD ( Figure 6D) was examined within previously published single cell and nuclei RNAseq datasets from the mouse TA (scMuscle) (McKellar et al., 2021) and soleus (myoatlas) (Petrany et al., 2020). Using these datasets, the cell type specificity of 17 protein coding genes was confirmed. These include MYOC and CILP, highest expressed in the fibroblasts in both scMuscle (Supplementary Figure S6A) and myoatlas (Supplementary Figure S6B) in mouse.
We note PDK4 as the most dysregulated gene within "Negative Regulation of Apoptotic Process", and the most dysregulated protein coding gene among all 868 genes differentially expressed between DMD and healthy TA in our transcriptome-wide analysis. PDK4, encoding pyruvate dehydrogenase kinase 4, is downregulated 1,453X in DMD and only superseded by the unprocessed pseudogene OR7E29P, which is downregulated 11,396X ( Figure 6A; Supplementary Table S3). Downregulation of PDK4 in DMD has been previously reported in the slow (type I) fibers (1.74X, p = 2.78E-04), and upregulation in the fast type IIa and IIx (Scripture-Adams et al., 2022). PDK4 is also downregulated in mdx quadriceps and TA compared to age-matched controls (Matsakas et al., 2013). PDK4 is downregulated in DMD, but higher in the more susceptible VL (1.98X, Supplementary Table S1). Interestingly, PDK4 is a druggable gene, with tretinoin (a vitamin A derivative) being a known upregulator (Supplementary Table S1).

Discussion
Our goal in this study was to analyze two different healthy limb muscles with more similar functional roles (VL and TA), which have a consistently observed difference in disease progression, that is more modest in degree than the greater protection from damage of EOM compared to limb muscles in the absence of dystrophin. Because of published longitudinal imaging data (Rooney et al., 2020), we could select comparable muscles amenable to biopsy in healthy adults (Barthelemy et al., 2020). The protection of TA relative to VL is less striking than that of EOM compared to limb but is still substantial with an estimated shift in equivalent damage of 8.5 years in humans (Rooney et al., 2020). VL and TA demonstrate a substantial difference in their susceptibility to lack of dystrophin, and our transcriptomic study of paired samples from the same healthy individuals identifies a large portion of the transcriptome as altered, with 3,410 differentially expressed genes. There is inherent biological variability within each large muscle, and there is some potential variability added to transcriptomic comparisons due to the small sample analyzed, which may not be representative of the whole muscle. The relatively small sampling by biopsy can introduce variability from sampling different parts of each muscle. This could result in a reduction in the number of differentially expressed genes, but should not lead to false expression differences between muscle groups. We try to limit variation by sampling the same relative location of VL and TA, which indeed resulted in highly significant differential gene expression detection. Because the transcriptomic differences between muscles in this study greatly exceeded those driven by genetic variation, we note that future studies may not require a paired design approach that was used here to maximize discovery and control for interindividual differences.
Our study particularly highlights calcium homeostasis, ECM, and regulation of apoptosis, and provides a dataset for exploration to investigate potential protective mechanisms of myofibers to loss of dystrophin in skeletal muscle. By studying muscles that have a substantial difference in their rate of disease progression in DMD, we sought to identify mechanisms of myofiber protection, complement genetic modifier studies and reveal novel therapeutic targets. There is some overlap between prior gene expression work comparing EOM to limb muscles and this study comparing TA to VL, including enrichment of genes with functions related to sarcomere structure, calcium homeostasis, muscle development, metabolic and immune processes, vasculature development, regulation of cell death and extracellular matrix, and thus supports that these pathways are relevant to how muscles are differentially susceptible to damage with lack of dystrophin. Comparison with genes dysregulated in DMD skeletal muscle compared to healthy further highlights the potential role of ECM and negative regulation of apoptosis in the differential susceptibility of VL and TA in DMD. We note that the mean age of our DMD cohort (4.5 years) is younger than the healthy cohort (21.2 years), and we attempted to reduce the effect of age on the identified gene expression differences. However, age could also be contributing to gene expression differences reported here.
Recently, a relatively higher regenerative capacity of EOM muscle stem cells was identified and attributed to upregulation of thyroid-stimulating hormone receptor (TSHR) signaling through upregulation of adenylate cyclase activity in EOM relative to limb muscle (Taglietti et al., 2023). Although TSHR is not differentially expressed between VL and TA in our study, "Adenylate Cyclase-Activating G Protein-Coupled Receptor Signaling Pathway" trended toward significance among the biological process GO terms (p = 9.92E-02), with 16 of the 17 genes higher in the TA (data not shown), suggesting a protective role in the TA and consistent with the proposed therapeutic relevance of upregulation of adenylate cyclase in DMD, where adenylate cyclase activation stimulates TSHR signaling, reduces muscle stem cell senescence and improves their proliferation (Taglietti et al., 2023).
Of note, only 24.9% of the differentially expressed genes were highest expressed in the myofibers, indicating that many of nonmyofiber cells are likely to play an important role in protecting myofibers from death. A caution of our work is that the healthy muscles are sampled without active degeneration/regeneration or induced muscle damage, which is a chronic state in DMD, and thus our data does not necessarily reveal mechanisms that may be only induced with muscle injury.
ECM deposition is an important component of the muscle structure and function (Loreti and Sacco, 2022), and there is an enrichment of differentially expressed genes that encode "Collagen-Containing Extracellular Matrix". ECM remodeling is necessary to properly activate muscle stem cells during regeneration, and the dysregulation of ECM proteins has been associated with regeneration defects in muscle diseases (Loreti and Sacco, 2022). In addition, the ECM stiffness, which varies depending on ECM composition, can modulate satellite cell activity and myofibergenerated force during contraction, and undergoes changes with age (Sinha et al., 2020). Thus, observed differences in ECM gene expression in VL and TA may contribute to their differential progression in DMD and cause differences in the fibrotic response within each muscle type. We highlight MYOC as a potentially protective anti-fibrotic, and CILP as a potentially damaging pro-fibrotic gene in DMD.
Myofiber death in DMD has been mainly attributed to necrosis (Bencze, 2023). However, a higher rate of apoptotic nuclei in DMD compared to healthy muscle has been repeatedly observed (Tews and Goebel, 1997;Sandri et al., 1998;Serdaroglu et al., 2002), particularly before necrosis initiates (Tidball et al., 1995). In addition, p53 is one of the most highly induced transcription factors in mdx (Dogra et al., 2008), and its inhibition reduced exercise-induced necrosis in the dystrophic mouse (Waters et al., 2010), suggesting an important role in the mdx pathology. We identify an enrichment of "Regulation Of Apoptotic Process" genes that are differentially expressed, and a 4.2 fold increase in ZNF385A in TA (p-value = 1.76E-102), which is a reported modulator of p53 that reduces pro-apoptotic signaling (Das et al., 2007). This relatively higher expression of ZNF385A is also observed in gracilis (Abbassi-Daloii et al., 2023) and EOM (Porter et al., 2001;Terry et al., 2018) which are protected in DMD compared to the VL. Because of the potential impact of ZNF385A to suppress apoptosis, ZNF385A Frontiers in Genetics frontiersin.org may protect the TA via modulation of p53 signaling towards an antiapoptotic state. Further studies need to be conducted on 1) what are the direct or indirect targets of ZNF385A in human muscle, 2) whether up or downregulating the expression of ZNF385A has an effect on the apoptotic rate in myotubes and muscle stem cells exposed to an apoptotic-inducing condition, and 3) whether inducing its expression in dystrophic myotubes protects myofibers and other resident muscle cells from death. DMD modifier genes were more likely to be differentially expressed between VL and TA, supporting a functional role for several modifiers from this orthogonal transcriptomic study. Identifying novel genetic modifiers of DMD remains a challenge, as studies are limited due to sample size, and thus under-powered to detect genome-wide significance. Thus, augmenting with other data types is relevant to increasing confidence in observed genetic modifiers.
The higher expression of the genetic modifier LTBP4 in TA and its restriction to fibroblasts is consistent with a role in slowing disease progression, as it binds TGF-β and thus reduces TGF-β signaling (Flanigan et al., 2013), a major driver of fibrosis. LTBP4 also had high expression in satellite cells, an unexpected finding. Although TGF-β signaling is known to modulate the muscle stem cell function, the specific role of LTBP4 in satellite cells has not been elucidated. If LTBP4 participates in regulation of satellite cell function, it may create differences in the muscle-specific regenerative ability that needs further exploration, particularly in the context of the protective IAAM haplotype.
Actinin-3 null allele has been previously reported to be protective in DMD via a shift to a more oxidative metabolism (Hogarth et al., 2017) characteristic of slow fibers, which are more protected from loss in DMD. Various studies have examined whether there is an associated change in fiber type composition in the ACTN3 null genotype, with some finding no evidence for a fiber type shift (MacArthur et al., 2008;Broos et al., 2016), and others finding significant differences in the fiber type composition across genotype groups (Vincent et al., 2007). The discrepancies could be due to different sampling methods, such as the number of fibers counted. The higher proportion of slow fibers in XX individuals may be protective because slow fibers are protected for longer in DMD (Webster et al., 1988). We detected previously unreported expression of actinin-3 in slow fibers at low levels, particularly in the RR and RX groups. The presence of actinin-3 in slow fibers in the VL may render VL slow fibers more susceptible to damage by increasing glycolytic and reducing oxidative metabolism.
Differential mutual exclusion of exons 143 and 144 of NEB, as we observed here, has been observed for another pair of human muscles, and gastrocnemius (GN) preferentially includes exon 144 and TA exon 143 (Donner et al., 2004;Lam et al., 2018), the latter being consistent with this study. Similar to the difference in progression between the VL and TA in DMD where the TA is delayed by about 8.5 years, the TA is delayed by 3.4 years relative to GN (Rooney et al., 2020). Because the more affected VL and GN preferentially include E144 and not E143, we hypothesize that E143 included nebulin could plausibly confer different sarcomere properties that result in protection of the muscle membrane to contraction-induced injury in the absence of dystrophin.
Nebulin has various roles in skeletal muscle. Although the most commonly known role is thin filament length regulation and stabilization, it also has been recently found to have roles in modulating contractile force, calcium handling, and the actinmyosin interaction (Chu et al., 2016). Mutations in NEB are the most common cause of autosomal-recessive nemaline myopathy, characterized by Z-disk and thin filament proteins aggregated into nemaline bodies, Z-disk disorganization and consequently, earlyonset muscle weakness that mainly affect proximal muscles (Lehtokari et al., 2014). Homozygous intronic mutations in intron 144, which created an alternative donor (5') splice site in exon 144 and a decrease in NEB expression, were found causal in a case of a 6-year-old boy with general muscle weakness and nemaline bodies consistent with nemaline myopathy (Laflamme et al., 2021). Exons 143 and 144 encode the super repeat region 21 (S21) of nebulin (Lam et al., 2018) and how they differ functionally has not been extensively studied. The only reported difference is in their charge, hydrophobicity and the predicted presence of a protein kinase C phosphorylation site in the E144 but not in the E143 (Donner et al., 2004). The central super repeat region, which has 22 super repeats in total, has been proposed to interact with KLHL40 (Garg et al., 2014). KLHL40 is located in the sarcomere I and A bands, where it binds to nebulin (Garg et al., 2014). Similar to mutations in the NEB exon 143-144 region, KLHL40 deficiency is associated with nemaline myopathy (Garg et al., 2014). These data indicate that the S21 repeat region is critical for proper sarcomere organization, and consequently, muscle function. Nebulin S21 isoforms with different charge and hydrophobicity can potentially modulate the sarcomere organization, structure, and stability and lead to a different susceptibility of the dystrophinglycoprotein-sarcomere link to damage in the absence of dystrophin.
Previous reports on isoform switching across leg muscles identified 200 switching isoforms among 79 genes (Abbassi-Daloii et al., 2023). However, we did not identify any of these isoforms switch events between VL and TA. These findings could be partially attributed to the different skeletal muscles studied, RNA quality and the sequencing library type. In our study, we utilized ribosomal depletion before cDNA synthesis. However, poly(A) libraries can be 3' end biased (Shi et al., 2021) and this can affect isoform quantification.
This study further provides insights into transcriptomic signatures of differentially affected muscle groups, at both the gene and isoform level, and constitutes the first study, to our knowledge, to augment transcriptomic data from different healthy human skeletal muscles using single nuclei transcriptomics to unravel the complexity of tissue heterogeneity and its contribution to intrinsic transcriptomic signatures. To our knowledge, this study also generated the second and largest reported DMD bulk RNAseq dataset, from young ambulatory patients with the same type of DMD mutation (nonsense mutation). An existing dataset of 5 DMD muscle RNAseq (sequenced muscle not specified) can be found in the Sequence Read Archive (SRA) database (PRJNA734152), and RNAseq for four different muscles (1 biceps, 1 quadriceps, 1 gastrocnemius, 1 tibialis anterior) can be found in PRJNA342787. In addition, this is the first throughput dataset of the DMD TA. Although various other datasets of human DMD muscle microarray are found in the Gene Expression Omnibus (GEO) database (GSE3307, GSE109178, GSE6011, Frontiers in Genetics frontiersin.org GSE1004, GSE38417, GSE13608), GSE6011 is the microarray dataset at the earliest stage reported, and it corresponds to the quadriceps (at times used to refer to the VL) at less than 2-year of age. Considering that our DMD TA dataset is from muscle at 2-7 years of age, and that the TA is protected for 8.5 years compared to the VL (Rooney et al., 2020), we estimate that the herein generated TA dataset is the DMD whole muscle transcriptome at the earliest stage of the disease reported to date. Furthermore, the healthy snRNAseq and bulk RNAseq datasets provide useful resources for identification of muscle disease genes through transcriptomics, which require healthy reference materials (Lee et al., 2020). In addition, the snRNAseq dataset could be used to identify splicing factors co-expressed in single cells predominantly expressing different isoforms, and various methods have been developed to overcome the challenges of isoform quantification caused by 3' bias, low sequencing depth and dropout (Huang and Sanguinetti, 2017;Song et al., 2017;Hu et al., 2020;Pan et al., 2021). Establishing a single nuclei atlas of healthy human muscles will allow for a better understanding of muscle-specific responses to lack of dystrophin in particular cell types, how genetic modifiers may influence these, whether there is a preferential responsiveness of specific muscle groups to therapeutic approaches and what the cellular underlying mechanisms are, and how to mimic these intrinsic mechanisms to improve the effectiveness of current therapeutics.

Data availability statement
The raw RNAseq and snRNAseq data generated and analyzed in this study can be found in the Sequence Read Archive (SRA) (BioProject ID: PRJNA976807, https://www.ncbi.nlm.nih.gov/sra/ PRJNA976807). The Seurat object for the snRNAseq data can be found in https://www.synapse.org/#!Synapse:syn51794252/.

Ethics statement
The studies involving human participants were reviewed and approved by UCLA Institutional Review Board (UCLA IRB, #18-001366, #11-001087). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

Author contributions
SN-R, SN, MM and FB contributed to the conception, execution, and design of the study. SN-R wrote the first draft of the manuscript. SN and JW performed the muscle biopsies. FB and SN-R processed and preserved the muscle biopsies. ED consented the healthy volunteers. SN-R and RW extracted the whole tissue RNA. FB performed the muscle sectioning. DS and KC designed the protocol and extracted and sorted the single nuclei. SN-R performed the RNAseq and snRNAseq analyses. SN-R performed the immunofluorescent staining, microscopy, and image analyses with guidance from FB. FG performed the validation of alternative splicing in NEB by RT-PCR. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by the Center for Duchenne Muscular Dystrophy at UCLA. SN-R was supported by NIH T32HG002536 Genomic Analysis and Interpretation Training Grant and the CDMD Azrieli Graduate Award. KC was supported by NIH T32AR065972 Muscle Cell Biology Pathophysiology and Therapeutics Training Grant.